13 research outputs found

    Insights into mammalian adaptive evolution through genomics data

    No full text
    Although the genome sequencing revolution is still in its infancy, we must acknowledge it as the major driver of biology since the beginning of the 21st century. The availability of a large collection of complete mammalian genomes due to high-throughput sequencing technologies allows us to begin the exploration of how the evolutionary diversification of gene content reflects the ecological adaptations of different taxa. Novelty arises in evolution through the transformation or combination of existing systems and, as shown recently, also from scratch. This thesis is centered around these different mechanisms of evolutionary innovation. It includes a common methodological part in which we propose a simple method to optimize multiple alignments and examine its effect in positive selection analyses, the exploration of the origin and evolution of mammalian-specific genes, and the study of gene regulation in mammalian adaptations (e.g. hibernation) using high-throughput technologies.Tot i que la denominada era de la genòmica es troba encara a la seva infància, ha estat un dels principals impulsors de la biologia des del començament del segle 21. L’accés a una creixent col•lecció de genomes complets de mamífers, gràcies a les tècniques de seqüenciació massiva, ens permet explorar com la diversificació evolutiva dels gens es tradueix en les diferents adaptacions ecològiques dels diferents tàxons. La innovació apareix a l’evolució a través de la transformació o la combinació de sistemes preexistents, fins i tot, nous gens poden aparèixer a partir de regions prèviament no codificants, com s’ha demostrat recentment. Aquesta tesi s’articula al voltant d’aquests mecanismes d’innovació evolutiva. Inclou una part metodològica comuna on es proposa un mètode simple per optimitzar alineaments múltiples i avaluar-ne l’efecte en anàlisis de selecció positiva, l’exploració de l’origen i evolució de gens específics de mamífers i l’estudi de la regulació gènica en una adaptació pròpia dels mamífers (hibernació) mitjançant tècniques de seqüenciació massiva

    Insights into mammalian adaptive evolution through genomics data

    No full text
    Although the genome sequencing revolution is still in its infancy, we must acknowledge it as the major driver of biology since the beginning of the 21st century. The availability of a large collection of complete mammalian genomes due to high-throughput sequencing technologies allows us to begin the exploration of how the evolutionary diversification of gene content reflects the ecological adaptations of different taxa. Novelty arises in evolution through the transformation or combination of existing systems and, as shown recently, also from scratch. This thesis is centered around these different mechanisms of evolutionary innovation. It includes a common methodological part in which we propose a simple method to optimize multiple alignments and examine its effect in positive selection analyses, the exploration of the origin and evolution of mammalian-specific genes, and the study of gene regulation in mammalian adaptations (e.g. hibernation) using high-throughput technologies.Tot i que la denominada era de la genòmica es troba encara a la seva infància, ha estat un dels principals impulsors de la biologia des del començament del segle 21. L’accés a una creixent col•lecció de genomes complets de mamífers, gràcies a les tècniques de seqüenciació massiva, ens permet explorar com la diversificació evolutiva dels gens es tradueix en les diferents adaptacions ecològiques dels diferents tàxons. La innovació apareix a l’evolució a través de la transformació o la combinació de sistemes preexistents, fins i tot, nous gens poden aparèixer a partir de regions prèviament no codificants, com s’ha demostrat recentment. Aquesta tesi s’articula al voltant d’aquests mecanismes d’innovació evolutiva. Inclou una part metodològica comuna on es proposa un mètode simple per optimitzar alineaments múltiples i avaluar-ne l’efecte en anàlisis de selecció positiva, l’exploració de l’origen i evolució de gens específics de mamífers i l’estudi de la regulació gènica en una adaptació pròpia dels mamífers (hibernació) mitjançant tècniques de seqüenciació massiva

    Diverse families of transposable elements affect the transcriptional regulation of stress-response genes in Drosophila melanogaster

    No full text
    Although transposable elements are an important source of regulatory variation, their genome-wide contribution to the transcriptional regulation of stress-response genes has not been studied yet. Stress is a major aspect of natural selection in the wild, leading to changes in the transcriptional regulation of a variety of genes that are often triggered by one or a few transcription factors. In this work, we take advantage of the wealth of information available for Drosophila melanogaster and humans to analyze the role of transposable elements in six stress regulatory networks: immune, hypoxia, oxidative, xenobiotic, heat shock, and heavy metal. We found that transposable elements were enriched for caudal, dorsal, HSF, and tango binding sites in D. melanogaster and for NFE2L2 binding sites in humans. Taking into account the D. melanogaster population frequencies of transposable elements with predicted binding motifs and/or binding sites, we showed that those containing three or more binding motifs/sites are more likely to be functional. For a representative subset of these TEs, we performed in vivo transgenic reporter assays in different stress conditions. Overall, our results showed that TEs are relevant contributors to the transcriptional regulation of stress-response genes.Ministerio de Economia y Competitividad [BFU2014-57779-P]; Ministerio de Ciencia, Innovación y Universidades/AEI [BFU2017-82937-P]; European Commission [H2020-ERC-2014-CoG-647900]. Funding for open access charge: European Commission [H2020-ERC-2014-CoG-647900]

    Improving genome-wide scans of positive selection by using protein isoforms of similar length

    No full text
    Large-scale evolutionary studies often require the automated construction of alignments of a large number of homologous gene families. The majority of eukaryotic genes can produce different transcripts due to alternative splicing or transcription initiation, and many such transcripts encode different protein isoforms. As analyses tend to be gene centered, one single-protein isoform per gene is selected for the alignment, with the de facto approach being to use the longest protein isoform per gene (Longest), presumably to avoid including partial sequences and to maximize sequence information. Here, we show that this approach is problematic because it increases the number of indels in the alignments due to the inclusion of nonhomologous regions, such as those derived from species-specific exons, increasing the number of misaligned positions. With the aim of ameliorating this problem, we have developed a novel heuristic, Protein ALignment Optimizer (PALO), which, for each gene family, selects the combination of protein isoforms that are most similar in length. We examine several evolutionary parameters inferred from alignments in which the only difference is the method used to select the protein isoform combination: Longest, PALO, the combination that results in the highest sequence conservation, and a randomly selected combination. We observe that Longest tends to overestimate both nonsynonymous and synonymous substitution rates when compared with PALO, which is most likely due to an excess of misaligned positions. The estimation of the fraction of genes that have experienced positive selection by maximum likelihood is very sensitive to the method of isoform selection employed, both when alignments are constructed with MAFFT and with Prank(+F). Longest performs better than a random combination but still estimates up to 3 times more positively selected genes than the combination showing the highest conservation, indicating the presence of many false positives. We show that PALO can eliminate the majority of such false positives and thus that it is a more appropriate approach for large-scale analyses than Longest. A web server has been set up to facilitate the use of PALO given a user-defined set of gene families; it is available at http://evolutionarygenomics.imim.es/palo.This work was funded by Ministerio de Economía y Competitividad (FPI BES-2010-038494 to J.L.V.-C., Plan Nacional BIO2009-08160 and BFU2012-36820) and Fundació ICREA to M.M.A

    Translation of neutrally evolving peptides provides a basis for de novo gene evolution

    No full text
    Accumulating evidence indicates that some protein-coding genes have originated de novo from previously non-coding genomic sequences. However, the processes underlying de novo gene birth are still enigmatic. In particular, the appearance of a new functional protein seems highly improbable unless there is already a pool of neutrally evolving peptides that are translated at significant levels and that can at some point acquire new functions. Here, we use deep ribosome-profiling sequencing data, together with proteomics and single nucleotide polymorphism information, to search for these peptides. We find hundreds of open reading frames that are translated and that show no evolutionary conservation or selective constraints. These data suggest that the translation of these neutrally evolving peptides may be facilitated by the chance occurrence of open reading frames with a favourable codon composition. We conclude that the pervasive translation of the transcriptome provides plenty of material for the evolution of new functional proteins.We are grateful for valuable discussions with many colleagues during this study. This work was funded by grants BFU2012-36820, BFU2015-65235-P and TIN2015-69175-C4-3-R from Ministerio de Economía e Innovación (Spanish Government) and co-funded by FEDER (EC). We also received funding from Agència de Gestió d’Ajuts Universitaris i de Recerca Generalitat de Catalunya (AGAUR), grant no. 2014SGR1121

    New genes and functional innovation in mammals

    No full text
    The birth of genes that encode new protein sequences is a major source of evolutionary innovation. However, we still understand relatively little about how these genes come into being and which functions they are selected for. To address these questions, we have obtained a large collection of mammalian-specific gene families that lack homologues in other eukaryotic groups. We have combined gene annotations and de novo transcript assemblies from 30 different mammalian species, obtaining ∼6,000 gene families. In general, the proteins in mammalian-specific gene families tend to be short and depleted in aromatic and negatively charged residues. Proteins which arose early in mammalian evolution include milk and skin polypeptides, immune response components, and proteins involved in reproduction. In contrast, the functions of proteins which have a more recent origin remain largely unknown, despite the fact that these proteins also have extensive proteomics support. We identify several previously described cases of genes originated de novo from noncoding genomic regions, supporting the idea that this mechanism frequently underlies the evolution of new protein-coding genes in mammals. Finally, we show that most young mammalian genes are preferentially expressed in testis, suggesting that sexual selection plays an important role in the emergence of new functional genes.The work was funded by grants BFU2012-36820 and BFU2015-65235-P from Ministerio de Economía e Innovación (Spanish Government) and co-funded by FEDER. We also received funding from Agència de Gestió d’Ajuts Universitaris i de Recerca Generatlitat de Catalunya (AGAUR), grant number 2014SGR112

    De novo assembly and functional annotation of blood transcriptome of loggerhead turtle, and in silico characterization of peroxiredoxins and thioredoxins

    No full text
    The aim of this study was to generate and analyze the atlas of the loggerhead turtle blood transcriptome by RNA-seq, as well as identify and characterize thioredoxin (Tnxs) and peroxiredoxin (Prdxs) antioxidant enzymes of the greatest interest in the control of peroxide levels and other biological functions. The transcriptome of loggerhead turtle was sequenced using the Illumina Hiseq 2000 platform and de novo assembly was performed using the Trinity pipeline. The assembly comprised 515,597 contigs with an N50 of 2,631 bp. Contigs were analyzed with CD-Hit obtaining 374,545 unigenes, of which 165,676 had ORFs encoding putative proteins longer than 100 amino acids. A total of 52,147 (31.5%) of these transcripts had significant homology matches in at least one of the five databases used. From the enrichment of GO terms, 180 proteins with antioxidant activity were identified, among these 28 Prdxs and 50 putative Tnxs. The putative proteins of loggerhead turtles encoded by the genes Prdx1, Prdx3, Prdx5, Prdx6, Txn and Txnip were predicted and characterized in silico. When comparing Prdxs and Txns of loggerhead turtle with homologous human proteins, they showed 18 (9%), 52 (18%) 94 (43%), 36 (16%), 35 (33%) and 74 (19%) amino acid mutations respectively. However, they showed high conservation in active sites and structural motifs (98%), with few specific modifications. Of these, Prdx1, Prdx3, Prdx5, Prdx6, Txn and Txnip presented 0, 25, 18, three, six and two deleterious changes. This study provides a high quality blood transcriptome and functional annotation of loggerhead sea turtles

    Stress response, behavior, and development are shaped by transposable element-induced mutations in Drosophila

    No full text
    Most of the current knowledge on the genetic basis of adaptive evolution is based on the analysis of single nucleotide polymorphisms (SNPs). Despite increasing evidence for their causal role, the contribution of structural variants to adaptive evolution remains largely unexplored. In this work, we analyzed the population frequencies of 1,615 Transposable Element (TE) insertions annotated in the reference genome of Drosophila melanogaster, in 91 samples from 60 worldwide natural populations. We identified a set of 300 polymorphic TEs that are present at high population frequencies, and located in genomic regions with high recombination rate, where the efficiency of natural selection is high. The age and the length of these 300 TEs are consistent with relatively young and long insertions reaching high frequencies due to the action of positive selection. Besides, we identified a set of 21 fixed TEs also likely to be adaptive. Indeed, we, and others, found evidence of selection for 84 of these reference TE insertions. The analysis of the genes located nearby these 84 candidate adaptive insertions suggested that the functional response to selection is related with the GO categories of response to stimulus, behavior, and development. We further showed that a subset of the candidate adaptive TEs affects expression of nearby genes, and five of them have already been linked to an ecologically relevant phenotypic effect. Our results provide a more complete understanding of the genetic variation and the fitness-related traits relevant for adaptive evolution. Similar studies should help uncover the importance of TE-induced adaptive mutations in other species as well

    Stress response, behavior, and development are shaped by transposable element-induced mutations in Drosophila

    No full text
    Most of the current knowledge on the genetic basis of adaptive evolution is based on the analysis of single nucleotide polymorphisms (SNPs). Despite increasing evidence for their causal role, the contribution of structural variants to adaptive evolution remains largely unexplored. In this work, we analyzed the population frequencies of 1,615 Transposable Element (TE) insertions annotated in the reference genome of Drosophila melanogaster, in 91 samples from 60 worldwide natural populations. We identified a set of 300 polymorphic TEs that are present at high population frequencies, and located in genomic regions with high recombination rate, where the efficiency of natural selection is high. The age and the length of these 300 TEs are consistent with relatively young and long insertions reaching high frequencies due to the action of positive selection. Besides, we identified a set of 21 fixed TEs also likely to be adaptive. Indeed, we, and others, found evidence of selection for 84 of these reference TE insertions. The analysis of the genes located nearby these 84 candidate adaptive insertions suggested that the functional response to selection is related with the GO categories of response to stimulus, behavior, and development. We further showed that a subset of the candidate adaptive TEs affects expression of nearby genes, and five of them have already been linked to an ecologically relevant phenotypic effect. Our results provide a more complete understanding of the genetic variation and the fitness-related traits relevant for adaptive evolution. Similar studies should help uncover the importance of TE-induced adaptive mutations in other species as well

    Hormone-control regions mediate steroid receptor-dependent genome organization

    Get PDF
    In breast cancer cells, some topologically associating domains (TADs) behave as hormonal gene regulation units, within which gene transcription is coordinately regulated in response to steroid hormones. Here we further describe that responsive TADs contain 20- to 100-kb-long clusters of intermingled estrogen receptor (ESR1) and progesterone receptor (PGR) binding sites, hereafter called hormone-control regions (HCRs). In T47D cells, we identified more than 200 HCRs, which are frequently bound by unliganded ESR1 and PGR. These HCRs establish steady long-distance inter-TAD interactions between them and organize characteristic looping structures with promoters in their TADs even in the absence of hormones in ESR1+-PGR+ cells. This organization is dependent on the expression of the receptors and is further dynamically modulated in response to steroid hormones. HCRs function as platforms that integrate different signals, resulting in some cases in opposite transcriptional responses to estrogens or progestins. Altogether, these results suggest that steroid hormone receptors act not only as hormone-regulated sequence-specific transcription factors but also as local and global genome organizers.We received funding from the European Research Council under the European Union's Seventh Framework Program (FP7/2007–2013)/ERC Synergy grant agreement 609989 (4DGenome). The content of this manuscript reflects only the author's views and the Union is not liable for any use that may be made of the information contained therein. We acknowledge support of the Spanish Ministry of Economy and Competitiveness, ‘Centro de Excelencia Severo Ochoa 2013–2017’ and Plan Nacional (SAF2016-75006-P), as well as support of the CERCA Programme/Generalitat de Catalunya
    corecore